Modern optimization is a partnership between high-level algorithmic choice and low-level machine awareness. While Asymptotic Efficiency defines theoretical limits, the Performance Imperative demands we tackle constant factors that compilers cannot resolve alone.
1. The Hierarchy of Optimization
Success follows a linear process: first, eliminate asymptotic inefficiency (e.g., $O(N^2) \to O(N)$). Next, address Optimization Blockers—primarily Memory Aliasing and procedure call overhead (like constant bounds checking in get_vec_element).
2. Data-Flow & Constraints
Compilers are conservative by safety; they won't optimize if a pointer *dest might overlap with vector data. We measure real-world speed via Cycles Per Element (CPE). Performance is often modeled by scaling factors like $\alpha = 0.974$, where overhead shifts the execution curve (e.g., $209/\alpha = 39.0$).
3. Hardware Realities
Optimization requires understanding the Retirement Unit and the Critical Path. Even simple loops are limited by the Throughput Bound of functional units or the Latency Bound of dependency chains.